Referring To Displays In Multimodal Interfaces
نویسندگان
چکیده
A system which displays information graphically, and also allows natural language queries, should allow these queries to interrogate the displayed (visual) information. Ideally this would use some uniform method for processing queries both about the display and about the world model. Such a system would have to cope with ambiguities introduced by these two sources of information. These ambiguities, and a preliminary proposal for a system to deal with it, are the main topics of this paper. 1 I n t r o d u c t i o n Projects which have attempted to' integrate natural language (NL) with graphical displays (B~s and Guillotin, 1992; Neal and Shapiro, 1991; Pineda, 1989) have mainly focussed on one of two problems: 1. How can output text be coordinated with graphical information displayed on the screen? 2. How can pointing gestures be coordinated with NL input? We are interested in a slightly different issue, namely: How can NL terms be used, in a relatively uniform way, to refer to visual objects on the screen as well as the objects (for example, database items) which they may denote? The situation we have in mind is where the computer system has some stored knowledge base, database or model, and is able to graphically display selected items from that store. The user wishes to interact with the system, and may wish to ask questions which either allude to visual features of the display (e.g. Is the blue zone inside the city boundary?) or are directly about the meaning of the display (e.g. What does the blue marking represent?). Such queries require that the system have access to some representation of what is represented on the screen, and that this representation be amenable to NL or multimodal (MM) querying. 1 2 W o r l d M o d e l a n d D i s p l a y M o d e l It is common in systems which present visual information on the screen (e.g. GISs) for there to be a display model. This is an explicit representation of what items are currently on the screen and what their characteristics are. This is distinct from the world model which represents the facts about the world that the system has, which may not be displayed on the screen. In such systems, the main role of the display model is to maintain the visual display in an orderly fashion, and to connect screen objects to world (or database) objects. It must be updated systematically as items appear, disappear or move on the screen. Very often, the display model is quite a low-level structure, as it performs basic housekeeping for the display. Our proposal is that, for NL querying of the visual display to be possible, the display model must contain suitable high level information in a form which is accessible to an NL front-end; preferably, this form would be similar to, or related to, the representation the NL front-end uses to access the world model. 3 I l lus trat ive E x a m p l e s A non-spatial domain It might seem that queries about the visual display would make sense only in a domain where spatial in1We shall discuss natural language, but with the assumption that working systems in a few years' time would operate with speech input. 80 D. He, G. Ri tch ie and J. Lee formation is directly relevant, such as a street map or room plan. However, if an iconic display is being used to represent some non-spatial set of objects, it might still be desirable to use visual attributes to refer to these abstract icons. To make these remarks slightly more concrete, let us consider a (fictitious) example system. This system does not handle spatial information, but it uses iconic representations on the screen to convey database facts to the user. The application is a car-sales catalogue, in which a number of (presumably used/second-hand/preowned) cars are available for the user to browse through. Icons on the screen represent individual cars, and various characteristics of the icons convey attributes of the corresponding cars (Figure 1). The
منابع مشابه
Arbitrating Multimodal Outputs: Using Ambient Displays as Interruptions
This work explores the use of ambient displays in the context of interruption. A multimodal interface was created to interrupt users using ambient displays in the form of heat and light. These ambient displays acted as external interruption generators. Experimental results show there are different effects on performance and disruptiveness caused by interruption modalities. Thermal interruptions...
متن کاملTactile Displays: Guidance for Their Design and Application
OBJECTIVE This article provides an overview of tactile displays. Its goal is to assist human factors practitioners in deciding when and how to employ the sense of touch for the purpose of information representation. The article also identifies important research needs in this area. BACKGROUND First attempts to utilize the sense of touch as a medium for communication date back to the late 1950...
متن کاملMatch: Multimodal Access to City Help
Interfaces to mobile information access devices need to allow users to interact using whichever mode or combination of modes are most appropriate, given their user preference, task at hand, and physical and social environment. This paper describes a multimodal application architecture which facilitates rapid prototyping of flexible nextgeneration multimodal interfaces. Our sample application MA...
متن کاملMultimodal Interaction and Proactive Computing
One important issue for proactive computing is how users control and interact with the systems they will carry and have access to when they are out in the field. One solution is to use multimodal interaction (interaction using different combinations of sensory modalities) to allow people to interact in a range of different ways. This paper discusses gestural interaction as an alternative for in...
متن کاملCognitive Status and Form of Reference in Multimodal Human-Computer Interaction
We analyze a corpus of referring expressions collected from user interactions with a multimodal travel guide application. The analysis suggests that, in dramatic contrast to normal modes of human-human interaction, the interpretation of referring expressions can be computed with very high accuracy using a model which pairs an impoverished notion of discourse state with a simple set of rules tha...
متن کاملTalk to your Displays
Ubiquitous computing spaces, which have displays generously embedded into the environment, allow interaction with graphical user interfaces in a much more casual manner than desktop computers, which tie the user a particular desk. But simply putting desktop applications on a ubiquitous display will not make their use casual. We propose applications that can roam among displays as well as to per...
متن کامل